Goto

Collaborating Authors

 automatic classification


Automatic classification of stop realisation with wav2vec2.0

Tanner, James, Sonderegger, Morgan, Stuart-Smith, Jane, Mielke, Jeff, Kendall, Tyler

arXiv.org Artificial Intelligence

Modern phonetic research regularly makes use of automatic tools for the annotation of speech data, however few tools exist for the annotation of many variable phonetic phenomena. At the same time, pre-trained self-supervised models, such as wav2vec2.0, have been shown to perform well at speech classification tasks and latently encode fine-grained phonetic information. We demonstrate that wav2vec2.0 models can be trained to automatically classify stop burst presence with high accuracy in both English and Japanese, robust across both finely-curated and unprepared speech corpora. Patterns of variability in stop realisation are replicated with the automatic annotations, and closely follow those of manual annotations. These results demonstrate the potential of pre-trained speech models as tools for the automatic annotation and processing of speech corpus data, enabling researchers to 'scale-up' the scope of phonetic research with relative ease.


Text classification using machine learning methods

Oancea, Bogdan

arXiv.org Artificial Intelligence

In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF, Word2Vec, FASTTEXT, and GloVe. Having the product names in a form of numeric vectors, we proceeded with a set of machine learning methods for automatic classification: Logistic Regression, Multinomial Naive Bayes, kNN, Artificial Neural Networks, Support Vector Machines, and Decision trees with several variants. The results show an impressive accuracy of the classification process for Support Vector Machines, Logistic Regression, and Random Forests. Regarding the word embedding methods, the best results were obtained with the FASTTEXT technique.


Leveraging AI for Automatic Classification of PCOS Using Ultrasound Imaging

Divekar, Atharva, Sonawane, Atharva

arXiv.org Artificial Intelligence

The AUTO-PCOS Classification Challenge seeks to advance the diagnostic capabilities of artificial intelligence (AI) in identifying Polycystic Ovary Syndrome (PCOS) through automated classification of healthy and unhealthy ultrasound frames. This report outlines our methodology for building a robust AI pipeline utilizing transfer learning with the InceptionV3 architecture to achieve high accuracy in binary classification. Preprocessing steps ensured the dataset was optimized for training, validation, and testing, while interpretability methods like LIME and saliency maps provided valuable insights into the model's decision-making. Our approach achieved an accuracy of 90.52%, with precision, recall, and F1-score metrics exceeding 90% on validation data, demonstrating its efficacy. The project underscores the transformative potential of AI in healthcare, particularly in addressing diagnostic challenges like PCOS. Key findings, challenges, and recommendations for future enhancements are discussed, highlighting the pathway for creating reliable, interpretable, and scalable AI-driven medical diagnostic tools.


Automatic Classification of General Movements in Newborns

Chopard, Daphné, Laguna, Sonia, Chin-Cheong, Kieran, Dietz, Annika, Badura, Anna, Wellmann, Sven, Vogt, Julia E.

arXiv.org Artificial Intelligence

General movements (GMs) are spontaneous, coordinated body movements in infants that offer valuable insights into the developing nervous system. Assessed through the Prechtl GM Assessment (GMA), GMs are reliable predictors for neurodevelopmental disorders. However, GMA requires specifically trained clinicians, who are limited in number. To scale up newborn screening, there is a need for an algorithm that can automatically classify GMs from infant video recordings. This data poses challenges, including variability in recording length, device type, and setting, with each video coarsely annotated for overall movement quality. In this work, we introduce a tool for extracting features from these recordings and explore various machine learning techniques for automated GM classification.

  Country:
  Genre: Research Report > New Finding (0.94)
  Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis

Pelloin, Valentin, Dodson, Lena, Chapuis, Émile, Hervé, Nicolas, Doukhan, David

arXiv.org Artificial Intelligence

This paper introduces a computational framework designed to delineate gender distribution biases in topics covered by French TV and radio news. We transcribe a dataset of 11.7k hours, broadcasted in 2023 on 21 French channels. A Large Language Model (LLM) is used in few-shot conversation mode to obtain a topic classification on those transcriptions. Using the generated LLM annotations, we explore the finetuning of a specialized smaller classification model, to reduce the computational cost. To evaluate the performances of these models, we construct and annotate a dataset of 804 dialogues. This dataset is made available free of charge for research purposes. We show that women are notably underrepresented in subjects such as sports, politics and conflicts. Conversely, on topics such as weather, commercials and health, women have more speaking time than their overall average across all subjects. We also observe representations differences between private and public service channels.


Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention

Meng, Fanqi, Wang, Xuesong, Wang, Jingdong, Wang, Peifang

arXiv.org Artificial Intelligence

With the rapid growth of software scale and complexity, a large number of bug reports are submitted to the bug tracking system. In order to speed up defect repair, these reports need to be accurately classified so that they can be sent to the appropriate developers. However, the existing classification methods only use the text information of the bug report, which leads to their low performance. To solve the above problems, this paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report (i.e. suggestion or explanation) is also considered, thereby improving the performance of the classification. First, we collect bug reports from four ecosystems (Apache, Eclipse, Gentoo, Mozilla) and manually annotate them to construct an experimental data set. Then, we use Natural Language Processing technology to preprocess the data. On this basis, BERT and TF-IDF are used to extract the features of the intention and the multiple text information. Finally, the features are used to train the classifiers. The experimental result on five classifiers (including K-Nearest Neighbor, Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest) show that our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.


Automatic Classification of Sexual Harassment Cases

#artificialintelligence

In our case, the data was provided by Safecity India, which is a platform launched on 2012, that crowdsources personal stories of sexual harassment and abuse in public spaces [2]. They have collected over 10,000 stories from over 50 cities in India, Kenya, Cameroon, and Nepal. More specifically they provided us a .cvs Additionally to the focal tasks of this project and as part of the NLP channel we decided to automate the category classification based on the sexual harassment case descriptions. Performing this classification task manually is time-consuming and leaving it entirely on the hands of the victim could produce ambiguity in the discrimination of the categories.


Unsupervised automatic classification of Scanning Electron Microscopy (SEM) images of CD4+ cells with varying extent of HIV virion infection

Wandeto, John M., Dresp-Langley, Birgitta

arXiv.org Artificial Intelligence

Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.


Automatic Classification of Pathology Reports using TF-IDF Features

Kalra, Shivam, Li, Larry, Tizhoosh, Hamid R.

arXiv.org Machine Learning

A Pathology report is arguably one of the most important documents in medicine containing interpretive information about the visual findings from the patient's biopsy sample. Each pathology report has a retention period of up to 20 years after the treatment of a patient. Cancer registries process and encode high volumes of free-text pathology reports for surveillance of cancer and tumor diseases all across the world. In spite of their extremely valuable information they hold, pathology reports are not used in any systematic way to facilitate computational pathology. Therefore, in this study, we investigate automated machine-learning techniques to identify/predict the primary diagnosis (based on ICD-O code) from pathology reports. We performed experiments by extracting the TF-IDF features from the reports and classifying them using three different methods---SVM, XGBoost, and Logistic Regression. We constructed a new dataset with 1,949 pathology reports arranged into 37 ICD-O categories, collected from four different primary sites, namely lung, kidney, thymus, and testis. The reports were manually transcribed into text format after collecting them as PDF files from NCI Genomic Data Commons public dataset. We subsequently pre-processed the reports by removing irrelevant textual artifacts produced by OCR software. The highest classification accuracy we achieved was 92\% using XGBoost classifier on TF-IDF feature vectors, the linear SVM scored 87\% accuracy. Furthermore, the study shows that TF-IDF vectors are suitable for highlighting the important keywords within a report which can be helpful for the cancer research and diagnostic workflow. The results are encouraging in demonstrating the potential of machine learning methods for classification and encoding of pathology reports.


Automatic classification of trees using a UAV onboard camera and deep learning

Onishi, Masanori, Ise, Takeshi

arXiv.org Machine Learning

Automatic classification of trees using remotely sensed data has been a dream of many scientists and land use managers. Recently, Unmanned aerial vehicles (UAV) has been expected to be an easy-to-use, cost-effective tool for remote sensing of forests, and deep learning has attracted attention for its ability concerning machine vision. In this study, using a commercially available UAV and a publicly available package for deep learning, we constructed a machine vision system for the automatic classification of trees. In our method, we segmented a UAV photography image of forest into individual tree crowns and carried out object-based deep learning. As a result, the system was able to classify 7 tree types at 89.0% accuracy. This performance is notable because we only used basic RGB images from a standard UAV. In contrast, most of previous studies used expensive hardware such as multispectral imagers to improve the performance. This result means that our method has the potential to classify individual trees in a cost-effective manner. This can be a usable tool for many forest researchers and managements.